67 research outputs found

    ASR error management for improving spoken language understanding

    Get PDF
    This paper addresses the problem of automatic speech recognition (ASR) error detection and their use for improving spoken language understanding (SLU) systems. In this study, the SLU task consists in automatically extracting, from ASR transcriptions , semantic concepts and concept/values pairs in a e.g touristic information system. An approach is proposed for enriching the set of semantic labels with error specific labels and by using a recently proposed neural approach based on word embeddings to compute well calibrated ASR confidence measures. Experimental results are reported showing that it is possible to decrease significantly the Concept/Value Error Rate with a state of the art system, outperforming previously published results performance on the same experimental data. It also shown that combining an SLU approach based on conditional random fields with a neural encoder/decoder attention based architecture , it is possible to effectively identifying confidence islands and uncertain semantic output segments useful for deciding appropriate error handling actions by the dialogue manager strategy .Comment: Interspeech 2017, Aug 2017, Stockholm, Sweden. 201

    Benchmarking Transformers-based models on French Spoken Language Understanding tasks

    Full text link
    In the last five years, the rise of the self-attentional Transformer-based architectures led to state-of-the-art performances over many natural language tasks. Although these approaches are increasingly popular, they require large amounts of data and computational resources. There is still a substantial need for benchmarking methodologies ever upwards on under-resourced languages in data-scarce application conditions. Most pre-trained language models were massively studied using the English language and only a few of them were evaluated on French. In this paper, we propose a unified benchmark, focused on evaluating models quality and their ecological impact on two well-known French spoken language understanding tasks. Especially we benchmark thirteen well-established Transformer-based models on the two available spoken language understanding tasks for French: MEDIA and ATIS-FR. Within this framework, we show that compact models can reach comparable results to bigger ones while their ecological impact is considerably lower. However, this assumption is nuanced and depends on the considered compression method.Comment: Accepted paper at INTERSPEECH 202

    Lifelong learning and task-oriented dialogue system: what does it mean?

    Get PDF
    International audienceThe main objective of this paper is to propose a functional definition of lifelong learning system adapted to the framework of task-oriented system. We mainly identified two aspects where a lifelong learning technology could be applied in such system: improve the natural language understanding module and enrich the database used by the system. Given our definition, we present an example of how it could be implemented in an actual task-oriented dialogue system that is developed in the LIHLITH project

    Semantic enrichment towards efficient speech representations

    Full text link
    Over the past few years, self-supervised learned speech representations have emerged as fruitful replacements for conventional surface representations when solving Spoken Language Understanding (SLU) tasks. Simultaneously, multilingual models trained on massive textual data were introduced to encode language agnostic semantics. Recently, the SAMU-XLSR approach introduced a way to make profit from such textual models to enrich multilingual speech representations with language agnostic semantics. By aiming for better semantic extraction on a challenging Spoken Language Understanding task and in consideration with computation costs, this study investigates a specific in-domain semantic enrichment of the SAMU-XLSR model by specializing it on a small amount of transcribed data from the downstream task. In addition, we show the benefits of the use of same-domain French and Italian benchmarks for low-resource language portability and explore cross-domain capacities of the enriched SAMU-XLSR.Comment: INTERSPEECH 202

    Étude sur les représentations continues de mots appliquées à la détection automatique des erreurs de reconnaissance de la parole

    No full text
    My thesis concerns a study of continuous word representations applied to the automatic detection of speech recognition errors. Our study focuses on the use of a neural approach to improve ASR errors detection, using word embeddings. The exploitation of continuous word representations is motivated by the fact that ASR error detection consists on locating the possible linguistic or acoustic incongruities in automatic transcriptions. The aim is therefore to find the appropriate word representation which makes it possible to capture pertinent information in order to be able to detect these anomalies. Our contribution in this thesis concerns several initiatives. First, we start with a preliminary study in which we propose a neural architecture able to integrate different types of features, including word embeddings. Second, we propose a deep study of continuous word representations. This study focuses on the evaluation of different types of linguistic word embeddings and their combination in order to take advantage of their complementarities. On the other hand, it focuses on acoustic word embeddings. Then, we present a study on the analysis of classification errors, with the aim of perceiving the errors that are difficult to detect. Perspectives for improving the performance of our system are also proposed, by modeling the errors at the sentence level. Finally, we exploit the linguistic and acoustic embeddings as well as the information provided by our ASR error detection system in several downstream applications.Nous abordons, dans cette thĂšse, une Ă©tude sur les reprĂ©sentations continues de mots (en anglais word embeddings) appliquĂ©es Ă  la dĂ©tection automatique des erreurs dans les transcriptions de la parole. Notre Ă©tude se concentre sur l’utilisation d’une approche neuronale pour amĂ©liorer la dĂ©tection automatique des erreurs dans les transcriptions automatiques, en exploitant les word embeddings. L’exploitation des embeddings repose sur l’idĂ©e que la dĂ©tection d’erreurs consiste Ă  trouver les possibles incongruitĂ©s linguistiques ou acoustiques au sein des transcriptions automatiques. L’intĂ©rĂȘt est donc de trouver la reprĂ©sentation appropriĂ©e du mot qui permet de capturer des informations pertinentes pour pouvoir dĂ©tecter ces anomalies. Notre contribution dans le cadre de cette thĂšse porte sur plusieurs axes. D’abord, nous commençons par une Ă©tude prĂ©liminaire dans laquelle nous proposons une architecture neuronale capable d’intĂ©grer diffĂ©rents types de descripteurs, y compris les embeddings. Ensuite, nous nous focalisons sur une Ă©tude approfondie des reprĂ©sentations continues de mots. Cette Ă©tude porte d’une part sur l’évaluation de diffĂ©rents types d’embeddings linguistiques puis sur leurs combinaisons. D’autre part, elle s’intĂ©resse aux embeddings acoustiques de mots. Puis, nous prĂ©sentons une Ă©tude sur l’analyse des erreurs de classifications, qui a pour objectif de percevoir les erreurs difficiles Ă  dĂ©tecter.Finalement, nous exploitons les embeddings linguistiques et acoustiques ainsi que l’information fournie par notre systĂšme de dĂ©tections d’erreurs dans plusieurs cadres applicatifs

    A study of continuous word representations applied to the automatic detection of speech recognition errors

    No full text
    Nous abordons, dans cette thĂšse, une Ă©tude sur les reprĂ©sentations continues de mots (en anglais word embeddings) appliquĂ©es Ă  la dĂ©tection automatique des erreurs dans les transcriptions de la parole. Notre Ă©tude se concentre sur l’utilisation d’une approche neuronale pour amĂ©liorer la dĂ©tection automatique des erreurs dans les transcriptions automatiques, en exploitant les word embeddings. L’exploitation des embeddings repose sur l’idĂ©e que la dĂ©tection d’erreurs consiste Ă  trouver les possibles incongruitĂ©s linguistiques ou acoustiques au sein des transcriptions automatiques. L’intĂ©rĂȘt est donc de trouver la reprĂ©sentation appropriĂ©e du mot qui permet de capturer des informations pertinentes pour pouvoir dĂ©tecter ces anomalies. Notre contribution dans le cadre de cette thĂšse porte sur plusieurs axes. D’abord, nous commençons par une Ă©tude prĂ©liminaire dans laquelle nous proposons une architecture neuronale capable d’intĂ©grer diffĂ©rents types de descripteurs, y compris les embeddings. Ensuite, nous nous focalisons sur une Ă©tude approfondie des reprĂ©sentations continues de mots. Cette Ă©tude porte d’une part sur l’évaluation de diffĂ©rents types d’embeddings linguistiques puis sur leurs combinaisons. D’autre part, elle s’intĂ©resse aux embeddings acoustiques de mots. Puis, nous prĂ©sentons une Ă©tude sur l’analyse des erreurs de classifications, qui a pour objectif de percevoir les erreurs difficiles Ă  dĂ©tecter.Finalement, nous exploitons les embeddings linguistiques et acoustiques ainsi que l’information fournie par notre systĂšme de dĂ©tections d’erreurs dans plusieurs cadres applicatifs.My thesis concerns a study of continuous word representations applied to the automatic detection of speech recognition errors. Our study focuses on the use of a neural approach to improve ASR errors detection, using word embeddings. The exploitation of continuous word representations is motivated by the fact that ASR error detection consists on locating the possible linguistic or acoustic incongruities in automatic transcriptions. The aim is therefore to find the appropriate word representation which makes it possible to capture pertinent information in order to be able to detect these anomalies. Our contribution in this thesis concerns several initiatives. First, we start with a preliminary study in which we propose a neural architecture able to integrate different types of features, including word embeddings. Second, we propose a deep study of continuous word representations. This study focuses on the evaluation of different types of linguistic word embeddings and their combination in order to take advantage of their complementarities. On the other hand, it focuses on acoustic word embeddings. Then, we present a study on the analysis of classification errors, with the aim of perceiving the errors that are difficult to detect. Perspectives for improving the performance of our system are also proposed, by modeling the errors at the sentence level. Finally, we exploit the linguistic and acoustic embeddings as well as the information provided by our ASR error detection system in several downstream applications

    Étude sur les reprĂ©sentations continues de mots appliquĂ©es Ă  la dĂ©tection automatique des erreurs de reconnaissance de la parole

    No full text
    My thesis concerns a study of continuous word representations applied to the automatic detection of speech recognition errors. Our study focuses on the use of a neural approach to improve ASR errors detection, using word embeddings. The exploitation of continuous word representations is motivated by the fact that ASR error detection consists on locating the possible linguistic or acoustic incongruities in automatic transcriptions. The aim is therefore to find the appropriate word representation which makes it possible to capture pertinent information in order to be able to detect these anomalies. Our contribution in this thesis concerns several initiatives. First, we start with a preliminary study in which we propose a neural architecture able to integrate different types of features, including word embeddings. Second, we propose a deep study of continuous word representations. This study focuses on the evaluation of different types of linguistic word embeddings and their combination in order to take advantage of their complementarities. On the other hand, it focuses on acoustic word embeddings. Then, we present a study on the analysis of classification errors, with the aim of perceiving the errors that are difficult to detect. Perspectives for improving the performance of our system are also proposed, by modeling the errors at the sentence level. Finally, we exploit the linguistic and acoustic embeddings as well as the information provided by our ASR error detection system in several downstream applications.Nous abordons, dans cette thĂšse, une Ă©tude sur les reprĂ©sentations continues de mots (en anglais word embeddings) appliquĂ©es Ă  la dĂ©tection automatique des erreurs dans les transcriptions de la parole. Notre Ă©tude se concentre sur l’utilisation d’une approche neuronale pour amĂ©liorer la dĂ©tection automatique des erreurs dans les transcriptions automatiques, en exploitant les word embeddings. L’exploitation des embeddings repose sur l’idĂ©e que la dĂ©tection d’erreurs consiste Ă  trouver les possibles incongruitĂ©s linguistiques ou acoustiques au sein des transcriptions automatiques. L’intĂ©rĂȘt est donc de trouver la reprĂ©sentation appropriĂ©e du mot qui permet de capturer des informations pertinentes pour pouvoir dĂ©tecter ces anomalies. Notre contribution dans le cadre de cette thĂšse porte sur plusieurs axes. D’abord, nous commençons par une Ă©tude prĂ©liminaire dans laquelle nous proposons une architecture neuronale capable d’intĂ©grer diffĂ©rents types de descripteurs, y compris les embeddings. Ensuite, nous nous focalisons sur une Ă©tude approfondie des reprĂ©sentations continues de mots. Cette Ă©tude porte d’une part sur l’évaluation de diffĂ©rents types d’embeddings linguistiques puis sur leurs combinaisons. D’autre part, elle s’intĂ©resse aux embeddings acoustiques de mots. Puis, nous prĂ©sentons une Ă©tude sur l’analyse des erreurs de classifications, qui a pour objectif de percevoir les erreurs difficiles Ă  dĂ©tecter.Finalement, nous exploitons les embeddings linguistiques et acoustiques ainsi que l’information fournie par notre systĂšme de dĂ©tections d’erreurs dans plusieurs cadres applicatifs
    • 

    corecore